我们提出了一种新颖的少量射击动作识别框架,它增强了特定于类特征的特征歧视性,同时学习高阶时间表示。我们的方法的重点是一种新的时空浓缩模块,可以使用专用的本地补丁级别和全局帧级别富集子模块聚合空间和时间上下文。本地补丁级别的浓缩捕获了基于外观的动作特征。另一方面,全局帧级富集明确地编码了广泛的时间上下文,从而随着时间的推移捕获相关对象特征。然后利用产生的时空富集的表示来学习查询和支持动作子序列之间的关系匹配。我们在补丁级丰富的功能上进一步引入了查询类相似性分类器,通过在所提出的框架中加强特征学习来增强特定于类的特征歧视性。实验是在四次拍摄动作识别基准测试中执行:动力学,SSV2,HMDB51和UCF101。我们广泛的消融研究揭示了拟议贡献的好处。此外,我们的方法在所有四个基准上设置了一种新的最先进的。在挑战SSV2基准测试中,与文献中的最佳现有方法相比,我们的方法在分类准确性中实现了3.5%的绝对增益。我们的代码和型号将公开发布。
translated by 谷歌翻译
缺乏细粒度的关节(面部接头,手指)是艺术骨架动作识别模型的基本性能瓶颈。尽管瓶颈,但社区的努力似乎只是在提出新颖的建筑方面投入。为了具体地解决这个瓶颈,我们介绍了两个基于姿势的人类行动数据集 - NTU60-X和NTU120-x。我们的数据集扩展了最大的现有动作识别数据集NTU-RGBD。除了在NTU-RGBD中的每个骨架的25个主体关节之外,NTU60-X和NTU120-X数据集包括手指和面部接头,从而实现更丰富的骨架表示。我们适当地修改现有技术方法以使用引入的数据集实现培训。我们的结果展示了这些NTU-X数据集在克服上述瓶颈方面的有效性,并在先前最糟糕的行动类别中提高了最糟糕的瓶颈。可以在https://github.com/skelemoa/ntu-x找到代码和预磨料模型。
translated by 谷歌翻译
Applying Machine learning to domains like Earth Sciences is impeded by the lack of labeled data, despite a large corpus of raw data available in such domains. For instance, training a wildfire classifier on satellite imagery requires curating a massive and diverse dataset, which is an expensive and time-consuming process that can span from weeks to months. Searching for relevant examples in over 40 petabytes of unlabelled data requires researchers to manually hunt for such images, much like finding a needle in a haystack. We present a no-code end-to-end pipeline, Curator, which dramatically minimizes the time taken to curate an exhaustive labeled dataset. Curator is able to search massive amounts of unlabelled data by combining self-supervision, scalable nearest neighbor search, and active learning to learn and differentiate image representations. The pipeline can also be readily applied to solve problems across different domains. Overall, the pipeline makes it practical for researchers to go from just one reference image to a comprehensive dataset in a diminutive span of time.
translated by 谷歌翻译
Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information.
translated by 谷歌翻译
We demonstrate a Physics-informed Neural Network (PINN) based model for real-time health monitoring of a heat exchanger, that plays a critical role in improving energy efficiency of thermal power plants. A hypernetwork based approach is used to enable the domain-decomposed PINN learn the thermal behavior of the heat exchanger in response to dynamic boundary conditions, eliminating the need to re-train. As a result, we achieve orders of magnitude reduction in inference time in comparison to existing PINNs, while maintaining the accuracy on par with the physics-based simulations. This makes the approach very attractive for predictive maintenance of the heat exchanger in digital twin environments.
translated by 谷歌翻译
We study politeness phenomena in nine typologically diverse languages. Politeness is an important facet of communication and is sometimes argued to be cultural-specific, yet existing computational linguistic study is limited to English. We create TyDiP, a dataset containing three-way politeness annotations for 500 examples in each language, totaling 4.5K examples. We evaluate how well multilingual models can identify politeness levels -- they show a fairly robust zero-shot transfer ability, yet fall short of estimated human accuracy significantly. We further study mapping the English politeness strategy lexicon into nine languages via automatic translation and lexicon induction, analyzing whether each strategy's impact stays consistent across languages. Lastly, we empirically study the complicated relationship between formality and politeness through transfer experiments. We hope our dataset will support various research questions and applications, from evaluating multilingual models to constructing polite multilingual agents.
translated by 谷歌翻译
Bayesian Inference offers principled tools to tackle many critical problems with modern neural networks such as poor calibration and generalization, and data inefficiency. However, scaling Bayesian inference to large architectures is challenging and requires restrictive approximations. Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference and to estimate uncertainty with deep neural networks. Traditionally, the dropout mask is sampled independently from a fixed distribution. Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference. These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation. In this work, we propose GFlowOut to address these issues. GFlowOut leverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks. We empirically demonstrate that GFlowOut results in predictive distributions that generalize better to out-of-distribution data, and provide uncertainty estimates which lead to better performance in downstream tasks.
translated by 谷歌翻译
共享控制可以通过协助执行用户意图来帮助进行远程处理的对象操纵。为此,需要稳健和及时的意图估计,这取决于行为观察。在这里,提出了意图估计框架,该框架使用自然目光和运动功能来预测当前的动作和目标对象。该系统在模拟环境中进行了训练和测试,并在相对混乱的场景中和双手中产生的拾音器和放置序列,另一方面可能是手动。验证是在不同的用户和手中进行的,实现了预测的准确性和优势。对单个特征的预测能力的分析表明,在当前动作的早期识别中,抓握触发器和目光的凝视特征的优势。在当前的框架中,可以将相同的概率模型用于并行和独立工作的两只手,而提出了基于规则的模型来识别所得的双人动作。最后,讨论了这种方法对更复杂,全行为操纵的局限性和观点。
translated by 谷歌翻译
我们研究了图结构识别的问题,即在时间序列之间恢复依赖图的图。我们将这些时间序列数据建模为线性随机网络动力学系统状态的组成部分。我们假设部分可观察性,其中仅观察到一个包含网络的节点子集的状态演变。我们设计了一个从观察到的时间序列计算的新功能向量,并证明这些特征是线性可分离的,即存在一个超平面,该超平面将与连接的节点成对相关的特征群体与与断开对相关的节点相关联。这使得可以训练各种分类器进行因果推理的功能。特别是,我们使用这些功能来训练卷积神经网络(CNN)。由此产生的因果推理机制优于最先进的W.R.T.样品复杂性。受过训练的CNN概括了结构上不同的网络(密集或稀疏)和噪声级别的轮廓。值得注意的是,他们在通过合成网络(随机图的实现)训练时也很好地概括了现实世界网络。最后,提出的方法始终以成对的方式重建图,也就是说,通过确定每对相应的时间序列中的每对节点中是否存在边缘或箭头或不存在箭头。这符合大规模系统的框架,在该系统中,网络中所有节点的观察或处理都令人难以置信。
translated by 谷歌翻译
模型不可知的元学习算法旨在从几个观察到的任务中推断出先验,然后可以使用这些任务来适应新任务,但很少有示例。鉴于在现有基准中产生的任务的固有多样性,最近的方法使用单独的可学习结构,例如层次结构或图形,以实现对先验的特定任务适应。尽管这些方法产生了明显更好的元学习者,但我们的目标是在异质任务分配包含具有挑战性的分布变化和语义差异时提高其性能。为此,我们介绍了CAML(对比知识增强的元学习),这是一种新颖的方法,用于知识增强的几次学习,它演变了知识图以有效地编码历史经验,并采用了对比性的蒸馏策略,以利用编码的知识来为基础学习者的任务感知调制。使用标准基准测试,我们在不同的几次学习方案中评估CAML的性能。除了标准的少量任务适应外,我们还考虑了我们的经验研究中更具挑战性的多域任务适应和少数数据集泛化设置。我们的结果表明,CAML始终胜过最知名的方法,并实现了改善的概括。
translated by 谷歌翻译